Semi-Automated Transcription Generation for Pashto Cursive Script
نویسندگان
چکیده
Usually, a large amount of transcription data is required for training and benchmarking Optical Character Recognition (OCR) systems for new scripts like Pashto. In case of real image data; mostly the images are acquired through scanning. For supervised training scenarios, it is required to have a ground truth against the corresponding scanned images. Usually, the ground truth is created by transcribing the documents manually, which is an overwhelmingly laborious phase. This work introduces a semi-automated procedure for transcribing Pashto document images using a long short term memory (LSTM) network architecture. The process is applied for the transcription of 1000 images having Pashto ligatures and it improves the transcription performance to around three times of manual method.
منابع مشابه
Robust Optical Recognition of Cursive Pashto Script Using Scale, Rotation and Location Invariant Approach
The presence of a large number of unique shapes called ligatures in cursive languages, along with variations due to scaling, orientation and location provides one of the most challenging pattern recognition problems. Recognition of the large number of ligatures is often a complicated task in oriental languages such as Pashto, Urdu, Persian and Arabic. Research on cursive script recognition ofte...
متن کامل1 Invariant Handwriting Features Useful in Cursive - Script Recognition
The within-writer variability of handwriting forms one of the problems in the automatic recognition of cursive script. Variability can be handled by choosing handwriting features based upon the process of handwriting generation or upon computational models. Handwriting patterns are represented by a sequence of motor actions, i.e., "strokes", which can be identified by invariant segmentation. Ea...
متن کامل1 Invariant Handwriting Features Useful in Cursive - Script Recognition Hans - Leo
The within-writer variability of handwriting forms one of the problems in the automatic recognition of cursive script. Variability can be handled by choosing handwriting features based upon the process of handwriting generation or upon computational models. Handwriting patterns are represented by a sequence of motor actions, i.e., "strokes", which can be identified by invariant segmentation. Ea...
متن کاملCursive Script Postal Address Recognition Abstract Cursive Script Postal Address Recognition
Cursive Script Postal Address Recognition By Prasun Sinha Large variations in writing styles and di culty in segmenting cursive words are the main reasons for cursive script postal address recognition being a challenging task A scheme for locating and recognizing words based on over segmentation followed by dynamic programming is proposed This technique is being used for zip code extraction as ...
متن کاملBuilding a Perception Based Model for Reading Cursive Script
This paper presents a new perception based model for reading cursive script. We describe the organization of our pseudo-neuronal system and show the role of activation mechanism in perceiving and reading cursive script. We have introduced into our model some characteristics speciic to cursive script. First, we use more appropriate features such as ascenders and descenders. Second, we deal with ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016